Skip to the content.

Introduction

The representation of women in media has long been a topic of debate and scrutiny. While society has made significant progress in terms of gender equality over the past century, it is important to examine whether these changes are reflected in the films we watch. Movies have the ability to reveal societal norms and ideals, as well as the underlying beliefs and attitudes that shape our culture.

For this data analysis project, we will be using the CMU Movie Summary Corpus dataset, as well as additional datasets from Stanford CoreNLP, IMDb, Wikidata, IMDB, and Box Office Mojo, to explore the portrayal of women in film, including the roles of actresses, characters, and writers and directors. Through our analysis, we hope to gain a deeper understanding of how women have been depicted in media over time and how this representation may have evolved. By examining these factors, we can gain insight into the ways that society views and treats women, and consider how far we have come in addressing gender inequality.

The Data

Our analysis is based on merging the CMU Dataset, the Stanford CoreNLP-processed summaries, IMDb, Wikidata, IMDB and Box office Mojo.

We have separated the data in three tables:

The Impact Score metric

Movies

We have created a metric in order to measure the impact a movie has, based on their average rating and the number of votes. Our assumption is that an impactful movie has a lot votes and has either an extremely good or bad average rating.

We apply a logarithmic transformation to the number of votes, as this follows a power-law distribution. This allows us to normalize the data and accurately compare the impact of different movies. Next, we take the absolute value of the normalized average rating for each movie. This accounts for both very good and very bad movies, as both have a significant impact on audience reception. By combining these two factors, we are able to calculate the overall impact a movie has on its audience and compare this across different films.

\[\textrm{Impact Score}_\textrm{Movies} = \textrm{normalized} (\log(\textrm{number of votes})) \cdot \textrm{abs}(\textrm{normalized}(\textrm{IMDB rating}))\]

According to this metric, those are the top 10 most impactful movies of our dataset:

title average rating number of votes impact score
The Shawshank Redemption 9.3 2648879.0 9.897882
The Dark Knight 9.0 2620838.0 8.914723
Inception 8.8 2322848.0 8.149376
Fight Club 8.8 2093849.0 8.047722
Forrest Gump 8.8 2051278.0 8.027604
Pulp Fiction 8.9 2027513.0 8.329917
The Matrix 8.7 1894094.0 7.638405
The Lord of the Rings: The Fellowship of the Ring 8.8 1851387.0 7.927186
The Godfather 9.2 1836155.0 9.158800
The Lord of the Rings: The Return of the King 9.0 1824685.0 8.532330

Actors, writers and directors

In order to apply this metric to actors, writers and directors, we decided to use the Discounted Cumulative Gain.

For each actor, writer, or director, we first rank the movies they are linked to according to the impact score, in decreasing order. Then, we compute the discounted cumulative gain on this subset of movies using the following formula:

\[\textrm{Impact Score}_\textrm{Actors, Directors, Writers} = \sum_{i=1}^{\textrm{number of movies}}\frac{\textrm{movie metric}_i}{\log_2(i + 1)}\]

Here are the top 10 actors, writers and directors with the highest impact score:

actors directors writers
name impact score name impact score name impact score
Samuel L. Jackson 47.284516 Steven Spielberg 35.521989 Stephen King 35.702654
Robert De Niro 45.917416 Martin Scorsese 34.013618 George Lucas 29.177271
Michael Caine 42.677866 Alfred Hitchcock 30.919909 Christopher Nolan 29.138123
Morgan Freeman 42.382440 Christopher Nolan 29.116314 Bob Kane 28.512753
Al Pacino 39.385457 Francis Ford Coppola 27.786309 Quentin Tarantino 27.295414
Bruce Willis 38.878472 Quentin Tarantino 26.343020 Francis Ford Coppola 26.896934
Gary Oldman 37.168668 Akira Kurosawa 24.816421 Akira Kurosawa 26.660942
Robert Duvall 36.768965 Stanley Kubrick 24.712699 David S. Goyer 25.182420
Tom Hanks 36.712741 Clint Eastwood 23.366581 Billy Wilder 24.220891
Brad Pitt 36.554337 Uwe Boll 22.238082 Hayao Miyazaki 24.018087

Where are the Women?

When it comes to the representation of women in media, the numbers are not encouraging. In recent years, women have made up a small percentage of actors, directors, and screenwriters in the film industry.

When it comes to genre, women are most often represented in dramas and comedies, while they are underrepresented in action and sci-fi films.

Behind the Camera

When it comes to directors, men outnumber women by a significant margin. However, there are some female directors who have made a significant impact in the industry.

Some of the most successful female directors in recent years include Patty Jenkins, who directed the critically-acclaimed Wonder Woman, and Ava DuVernay, who directed the Oscar-nominated Selma.

In Front of the Camera

Similarly, women are underrepresented as actors in the film industry. However, there are some female actors who have made a significant impact and achieved great success in their careers.

Some of the highest-grossing actresses in recent years include Gal Gadot, who starred in the Wonder Woman franchise, and Emma Stone, who won an Oscar for her role in La La Land.

On the Screen

The representation of women on screen is also limited, with many female characters being relegated to supporting roles or falling into stereotypical tropes.

Some of the most successful female characters in recent years include Wonder Woman, played by Gal Gadot, and Katniss Everdeen, played by Jennifer Lawrence in the Hunger Games series.

Women of Impact

Despite the challenges facing women in the film industry, there are many women who have made a significant impact and achieved great success in their roles.

These women have not only excelled in their careers, but have also challenged stereotypes and paved the way for future generations of women in media.

That’s a Wrap!

In conclusion, the representation of women in media is limited and often stereotypical. However, there are many talented and successful women in the industry who are making a significant impact. It is important for the industry to continue to strive for greater diversity and representation, in order to create a more accurate and fair portrayal of women in media.